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(g) System and method for transmitting sequence dependent messages in a required sequence. 

6?) A computing system includes a plurality of nodes that are connected by a communications networic 
Each node comprises a communications interface that enables an exchange of messages with other 
nodes. A ready queue is maintained In a node and Indudes a plurality of message entries, eac^ message 
entry indicating an output message control data stnjcture. The node further indudes menwry for 
storing a plurality of output message control data stmctures, each induding one or more chained 
further control data stnjctures that define data comprising a message or a portion of a message that is 
to be dispatched. Control data stmctures that are chained from an output nrmssage control data 
stmcture exhibit a sequence dependency. A processor is controlled by the ready queue and enables 

3 dispatch of portions of the message designated by an output message control data structure and 
associated further control structures. The processor prevents dispatch of one portion of a message 
Q prior to dispatch of another portion of the message upon which the first portion is dependent even if 
S message transmissions are intenrupted. 
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This Invention relates to a system and method for transmitting sequence dependent messages in a re- 
quired sequence and. more particularly, to apparatus and procedures for enalrfing all nr>essages in a network 
transmission to be delivered In a predetermined order, even in the event that the transmission is discontinued 
and restarted at a later time. 

5 The prior art has handled ordering of transmitted messages in a multi-nodal networtt through use of various 

instrumentalities. Most systems use high-level protocols to achieve message ordering. Such protocols are com- 
plicated, require extensive software processing and in many such systems, protocol processing dominates the 
communication overhead. Examples of such software-based high level protocols can be found in U.S. Patents 
5.086.428 to Periman et al. and 5,151.899 to Thomas et al. 

10 Periman et al. employ packets which contain data Identifying an originating node, a sequence number 
which Indicates a packet's place in a sequence of packets, and an age value. A data base at a receiving node 
is updated by newly received packets. However, the nodes themsehres are reset If packets currently in the 
networtt exhibit later sequence numbers than newly received packets. Thomas et al. describe a system which 
tracks sequence numbers in packets transmitted over a data communication networtc Thomas et al. employ 

IS a bounded sequence number window and ignore any packet number below or above the window. A received 
packet map is maintained to keep track of which sequence numbers have been received and to enable filtering 
out of duplicate sequence numbers. 

Other prior art systems achieve ordering of messages through the use of hardware protocol engines which 
implement standard protocols. This approach often results in overly complicated hardware because a standard 

20 protocol engine does not exploit properties of a specific networic but rather is required to interface with a plur- 
ality of networt^s. Such hardware protocol engines also create new hardware/software interfaces and usually 
become a new source of software overhead. Other parallel computer systems that communicate through an 
interconnection networtt allow each node in the networic to have, at most, one outstanding m^age at a time. 
This results in a decrease in efficiency of the use of network bandwidth. 

25 Accordingly, the present Invention provides a node for transmitting sequence dependent messages in a 
required sequence in a computing system comprising a plurality of like nodes connected by a communications 
networic comprising: communication Interface means for exchanging messages with other nodes; message 
queue means for arranging a plurality of message entries in a queue, each message entry comprising a pointer 
to an output message control structure; memory for storing a plurality of output message control structures. 

30 each output message control structure indicating one or more chained further message control structures that 
define message portions which together comprise a message, each output message control structure further 
enabled to be chained to other message output control structures, a plurality of message control structures 
that are chained exhibiting a sequence dependency; and processor means controlled by message entries in 
said message queue means for dispatching at least first and second messages under control of first and sec- 

35 ond chained output message control structures, respectively, said processor means enabling dispatch of said 
second message only after successful transmission of said first message upon which said second message 
portion is sequence dependent 

Thus the invention advantageously provides an Improved message delivery system which guarantees that 
an ordered series of messages will be received in the required order and wherein minimal message handshak- 

40 Ing is employed. 

Af urtiier advantage of the Invention fe that an improved message delivery system in an internodal network 
is provided wherein hardware is employed to control message ordering and transmission. 

An embodiment provides a computing system including a plurality of nodes that are connected by a com- 
municattons networic Each node comprises a communications Interface that enables an exchange of mes- 
45 sages with other nodes. A ready queue Is maintained In a node and includes a plurality of message entries, 
each message entry indicating an output message control data structure. The node further Includes memory 
for storing a plurality of output message control data structures, each including one or more chained further 
control data structures that define data comprising a message or a portion of a message that is to be dispatch- 
ed. Control data structures that are chained from an output message control data structure exhibit a sequence 
50 dependency. A processor is controlled by the ready queue and enables dispatch of portions of the message 
designated by an output message control data structure and associated further control structures. The proc- 
essor fvevents dispatch of one portion of a message prior to dispatch of another portion of the message upon 
which the first portion is dependent even if message transmissions are Interrupted. 

An embodiment of the present invention will now be described, by way of example only, with reference to 
55 the accompanying drawings In which: 

Fig. 1 is a block diagram illustrating a nodal disk array for a host processor. 

Fig. 2 is a block diagram of an exemplary node employed in the system of Fig. 1. 

Fig. 3a is a diagram showing hardware and software control blocks that enable data messages to be re- 
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ceived and stored. 

Fig. 3b Is a diagram showlnfl hardware and softwaie control blocks that enable data messages to be com- 
piled and transmitted. ^ . 
Fig, 3c Is a diagram showing hardware and software control blocks that enable control messages to be 

5 compiled and transmitted. 

Fig. 3d is a diagram showing hardware and software control trfocks that enable control messages to oe 

received and stored. 

Fig. 4 Is a combined haidware/software block diagram which Illustrates the operation of a node. 
10 MESSAGE SEQUENCING PROCEDURE 

Each node in a miitl-node system includes multiple logical input ports and multiple logical output ports. 
Message control Is exerted by hardware- based input and output ports In the node. A communication trans- 
mission (hereafter referred to as a transmission-) is a sequence of messages sent from an output port to an 
16 Input port. A transmission may be continuous or may occur In discontinuous segments. Logically, multiple 
transmissions can be active at the same time. Separate communication paths are provided for control and data 

messages . 

In each node, the hardware manages two levels of outgoing message queues. Active transmissions form 
a first level queue, and there may be a plurality of these f irat level queues, each one exhibiting a different pri- 

20 ority. Each entry in a queue is the head of a second level queue which links together all messages belonging 
to a corresponding transmlsston. 

Queues of transmissions are managed so as to enable higher priority messages to be dispatched before 
lower prtorlty messages. Message ordering is not assured at the transmission queue level (i.e.. the firet level 
queue), in other words, different transmissions can be implemented In any order so long as all messages within 

25 a transmission (In the second level queue) are transmitted In order, if during a bransmission of mesaagea or 
message portions from a second level queue an Inteffupt occurs, the handware controlling the queues moves 
on to another transmission wrthout having bo waste network bandwidth. At such time, the hardware moves to 
a next transmission in the f Irat level queue, if any, or proceeds to a next lower priority queue. The discontinued 
transmission is again reached afterthe hardware has dispatched or attempted dispatch of other transmissions 

30 residing in the first level queue. 

For each message dispatched, the nodal hardware establishes a route between the source node and the 
destination node and transmits the contents of a message from a second level queue. Only when an acknowl- 
edgement is received that the message has arrived at the destination node, does the source node send a next 
message from the second level queue. This protocol enables an ordering of messages from the second level 

35 queue. Addittenal handshaking between the eource and deetbiation nodes is avoided to reduce natwoHc la- 
tency. 

If an acknowledgement is not received, the source node retries the transmission, possibly causing mes- 
sage duplication at the destination node. Message software at the destinatten node delects and drops dupli- 
cated messages by using a session-level sequence number. The source node software maintains a sequence 

40 numberforeach transmisston. This number is increased by one for every message dispatched during the trans- 
mission The source node embeds this number in every message that Is dispatched as a portion of a trans- 
mission. The destination node remembera the sequence number of the last message that has amved for the 
transmlsston. If a sequence number of a new message appears at the destination node that is greater than 
that of the last message In the transmission. It is used. Otherwise. It is a duplicate message and is dropped. 

45 Hereafter, the nodal array and node stmctures will be described with reference to Figs. 1 and 2. Ihe hard- 

ware and software control data structures required to implement the inventton wiB be described with reference 
to Figs. 3a-3d. and the overall operatton of the hardware and software to Implement the message protocol of 
the Invention hereof will be described in relation to Fig. 4. 

50 NODAL ARRAY AND NODE STRUCTURES 

Fig 1 Illustrates a disk drive array 10 configured, for example, as a multi-node network. Nodes A and D 
are data storage nodes that connect to coupled disk drives 12, 14 and 16. 18. respecthfely. While only four 
disk drives are shown, one skilled in the art will realize that disk drive array 10 can include many more disk 
55 drives. Apalrof convnunlcation Interface nodes B and C provide input/output communication functions for disk 
drive array 10. Host proceasore are coupled to nodes B and C via communicatton links. Disk drive amay 10 
further Includes a cache node E which provides a temporary storage facility for both Input and output message 
transfers. Disk drive array 10 is expandable by additk>n of further nodes, ail of which are interconnected by a 
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communication network 20. 

Each of nodes A- E In Fig. 1 1s configured In a standard node arrangement shown In Fig. 2. A node includes 
a microprocessor 22 that controls the overall functions of the node. A memory Interface module 24 controls 
communications between microprocessor 22 and a plurality of memory modules within the node. Memory In- 

5 terface module 24 also includes input/output hardware 25 for handling of control messages. Control messages 
are stored in a connected control store 26 which also contains code that controls the operation of micropro- 
cessor 22. Among other control code contained within control store 26 is an input port table 28 and an output 
port table 30. As will become apparent from the description below, a node Includes many logical Input/output 
ports, and an input port table 28 and an CHJtput port table 30 are provided for each physical Input/output port 

10 Entries In those tables correspond to the logical input/ou^xjt ports. 

The node of Fig. 2 includes a plurality of disk drives 32 (only one Is shown) that are connected via device 
Interfaces 34 to memory Interface 24 and a data buffer Interfece 35. Data buffer Interface 35 connects a data 
buffer 36 to a network interface 37. Data buffer 36 provides buffering functions for both Incoming and outgoing 
data messages (as contrasted to control messages). Buffer interface 35 further includes input/output hardware 

IS ports 38 for handling of received data. Input/output hardware ports 38 in buffer interface 35 and input/output 
hardware ports 25 In memory Interface 24 are controlled by entries In Input port tables 28 and output port tables 
30 in control store 26. Network interface 37 provides Interface f uncttons for both Incoming and outgoing mes- 
sage transfers. 

Operations within the node of Fig. 2 are controlled by software-generated control blocks. For any read or 
20 write action, a plurality of control blocks are assigned by software working in conjunction with microprocessor 
22 to enable setup of the hardware within the node in accordance with a required action. For any single read 
or write, the software assigns a plurality of control blocks. Each control block Includes at least one parameter 
required to enable a setup action by the hardware that is required during the read or write. 

Control block data structures enable the node of Fig. 2 to assemble a nf>essage that is to be transmitted 
25 to either another node, to a disk drive or to a host processor. The message may be assembled through use of 
a plurality of control blocks that are "chained" so that one control block Includes a pointer to a next control 
block. Control blocks further indicate a data processing action to occur that will enable assembly of data for 
a message, where the data is to be found, a designation of its structure, identification of buffer storage for 
holding the data comprising the message pending dispatch, and further data which Identifies where the data 
30 Is to be dispatched. The invention makes use of Input control blocks (ICBs) and output control blocks (OCBs). 
Each ICS and OCB respectively comprise a message. OCBs may be "chained" and as such, define a series 
of messages that have a sequence dependency that tracks the sequence of the chained blocks. The Inventton 
enables the ordered sequence of messages defined by the chained control blocks to invariably be retained 
during a transmission, even If the transmission Is Interrupted and later recommenced. 

35 

SOFTWARE COtfTROL BLOCK DATA STRUCTURES 

A description is hereafter provided of the control data structures that are employed in the node of Fig. 2. 
In Figs. 3a-3d. combined hardware/software block diagrams Illustrate control block data structures whtoh en- 

40 able both data messages and control messages to be dispatched from a source node and received at a des- 
tination in the required order. 

Refening to Figs. 3a and 3b, each node includes an input stem 50 and an output stem 52 that, respectively, 
handle incoming data messages and outgoing data messages. Figs 3c and 3d illustrate output and Input stems, 
respectively, for control messages. 

45 Input stem 50 (Fig 3a) Includes a hardware Input port 54 which is matched by an equivalent hardware output 
port 56 (see Fig 3a) in output stem 62. Hardware input port 54 is a physical entity in buffer interface 34 (see 
Fig. 2) that is used to manage processing and storage of in-bound data messages to a node. Hardware input 
port 54 and hardware output port 56 both have a set of associated hardware registers (not shown) which receive 
control data from control block data structures to be hereafter described. When alt of the requisite control data 

50 is Inserted into the hardware Input/output port registere. a particular data processing actton can then be ac- 
complished (e.g.. a message assembly and transmission) - using the control data present in the registere. 

Hardware Input port 54 is associated with an Input port table 58 that lists the many logical Input ports as- 
signed to hardware Input port 54. Each toglcal port is defined by an Input port table entry (iPTE) 60. a portion 
of whose data structure is shown in Table 1 below. 

55 
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TMPUT PO^y TABLE ENTRY (IPTE) 

- FIRST ICB 

- LAST ICB 

- FLAGS 

- TAG 1 

- POINTER TO OUTPUT HARDWARE PORT 

- POINTER TO OUTPUT LOGICAL PORT 
TABIiB 1 

An IPTE 60 includes a designation of a first Input control Wock (ICS) required to convnence a data proc- 
essing action (e.g., a message store action), and a designation of the last Input control block (ICB) that ter- 
minates the data processing action. Intermediate ICBs aro determined by chaining values contained wHhin the 
Individual control blocks and define messages whose order is to be maintained in a transmission. Thus each 
ICB comprises a complete message and all control blocks chained from an ICB define data comprising the 
message and constitute a transmlsston. ICBs also describe a data processing f unctton and enable location of 
data to be subjected to the data processing action. 

An IPTE 60 further includes: flags which define interrupt conditions, status states, response states, etc, 
a 'tag 1" value, and pointers to both output hardware port 56 and a togical output port These entries are not 
directly relevant to the functioning of this Invention, but are presented for completeness* sake. Those skilled 
In the art will realize that control messages received at a destinatton node prior to a data meesage'e reception 
enable the node to set up the various entries in IPTE 60 and all ICBs that are chained therefrom. 

y\fhen an input data message is received by hardware input port 54. depending upon the required data 
processing action, a series of ICBs 62 (Fig. 3a) are assigned by the software to enable the execution of the 
required action. The data structure of relevant portions of an ICB Is shown In TaWo 2 below. 

INPUT COrfTROL BLOCK (ICB) 

- NEXT ICB POINTER 

35 - FLAGS (e.g. ENABLE COMPLETION INTERRUPT) 

-TAG1 

- SECTOR LENGTH 
^ . SECTOR COUNT 

- START TDVE 
-END TDVE 

45 . TDVaCB POINTER 

TABLE 2 

Each ICB 62 includes a next ICB pointer wiiich is an address value of a next ICB data structure. It is this 
^ next ICB pointer value which accomplishes a chaining actton between ICBs. The pointer to the first ICB, as 
above Indicated, is contained in an IPTE 60. When the first ICB Is accessed through use of that pointer, all 
ICBs associated with the macroscopic data processing action can be determined by succeeding ICB pointeis 
that are included in ICBs that are chained. An ICB defines, by virtue of various flags contained within it, a par- 
ticular input-related data processing action to be performed. 

An ICB further includes information that enables locatkjn of data within a disk drive treck, Le. sector length, 
sector count and a track descriptor vector " pointer (TDV). ATDV 64 is a table which Includes entries that define 
a logical dhk track that may comprise a plurality of physical disk tracks. TDV 64 includes one or more track 

5 
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descriptor vector elements (TDVEs) 66. Each TDVE 66 is a control block which descnt>e8 a physical disk re- 
cord's format on a disk drive. 

In addition to a start TDVE pointer, an ICB also includes an end TDVE pointer so that all records required 
for the ICB action are identified by data within or accessible from the ICB, Further control data is present in 
5 an ICB, but is not relevant to the invention described herein. 

As above indicated, each ICB includes a pointer to a start TDVE 66. The TDVE data structure is illustrated 
in Table 3 below and contains a description of a record on a track. 

TRACK DESCRIPTOR VECTOR ELEMENT 
(TDVE) 

. FIELD 1 DATA ID (e.g. COUNT) 
. FIELD 2 LENGTH (e.g. KEY) 

- FIELD 3 LENGTH (e.g. DATA) 
-FLAGS 
• FIRST BCB 
-TAG1 
-TAG 2 

- RECORD NUMBER 
TABLE 3 

Assuming that records on a disk track are arranged using the known "Count, Key, Data" stoicture, a TDVE 
will include field descriptors for each of the Count, Key and Data fields. The Count key field will include the 
record count number that occurs in field 1 of the record; the field 2 value will include the length of the record 
name (i.e, the Key); and the field 3 value will indicate the length of data in the data portion of the disk record. 

As with other control blocks (remembering that each TDVE 66 is a control block), flags are included In a 
TDVE 65 which define interrupt states, control states, etc. A TDVE 66 further Include a pointer to a first buffer 
control block (BCB) 68. A BCB 68 includes control data to enable set up and assignment of physical buffer 
space to be emptoyed during a data write action (for example) and enables accomplishment of the Individual 
actions needed to assemble a received message for writing to disk. As will be hereafter apparent, BCBs may 
also be chained and the invention assures that their dispatch In a transmission is In their order of chaining, 
even in the event of an Interrupted transmission. 

ATDVE 66 next includes a tag 1 value (as aforedescribed) and also a tag 2 value that enables subsequent 
control blocks to be identif ied as property associated in the nDacroscopic data processing action. 

As indicated above, each TDVE 66 includes a pointer to a first buffer control block (BCB) 68 that defines 
what portion of menr>ory should be allocated as a buffer for the write action (for example). A BCB 68 data struc- 
ture is shown in Tat>le 4 below. 

BUFFER CONTROL BLOCK 
-NEXT BCB POINTER 

- DATA BYTES IN BUFFER 

- TAG 1/2 

SO 

- BUFFER SIZE 
. FLAGS 

• BUFFER ADDRESS 

55 

TABLE 4 
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A BCB 68 data structure commences with a pointer to a next BCB, it being realized that a plurality of buffer 

locations may be allocated to a data wrtteydata read operation. Referring back to Fig. 3a. assunrw that an ICS 

62 includes a pointer to TDV 64, with TDVE 0 defining a first record required to accomplish a data wnte action. 

Recall that ICB 62 includes both a start TDVE pointer and an end TDVE pointer which, In the case shown in 
6 Fig. 3a, is TDVE N. Each TDVE 66 further Includes a pointer to a BCB that defines an amount of buffer space 

(e.g. buffer 70) required to store the data record. Other TDVEs may include a pointer to a plurality of chained 

BCBs 72, 74, 76. which define additional buffer areas within memory to be allocated. 

Returning lo Table 4, each BCB data structure includes a next BCB pointer that enables a chaining of 

BCBs. A next value In a BCB data structure defines the number of data bytes stored in the physical buffer 
10 space, A further entry Is a tag 1 or a tag 2 value (not used for this invention). Each BCB data structure further 

includes a designation of the required buffer size, flags for various control functions and the addrass of the 

first buffer address In the buffer memory. 

Two additional control block structures are employed In the output stem to enable dispatch of messages. 

As shown In Fig 3b, those control block structures are output control blocks (OCBs) 80 and output port table 
IS entries (OPTEs) 82. OCB and OPTE control block data structures are Illustrated in Tables 5 and 6 and enable 

each unit of data accessed from disk to be provided to hardware output port 56 in output stem 52 (Fig. 3b). 

OUTPUT CONTROL BLOCK (OCB) 

- NEXT OCB POINTER 
-START TDVE 

- END TDVE 
-FLAGS 
-TAG 1 

- DESTINATION ADDRESS 
. LOGICAL INPUT PORT ADDRESS AT DEST. 

- MESSAGE DATA (FOR CONTROL) 

- TDV/BCB 

TABLE 5 



20 



25 



90 



36 



OUTPUT PORT TABLE ENTRY 
(OPTE) 

^ - START OF OCB CHAIN 

- END OF OCB CHAIN 
-FLAGS 

^ - NEXT OPTE 

- INPUT PHYSICAL PORT 

- INPUT LOGICAL PORT 
^ TABLE 6 



55 



An OCB 80 data structure (Table 5 and Rg. 3b) Includes a pointer to a next OCB. It also includes a pointer 
to TDV table 84. a start TDVE pointer and an end TDVE pointer. Those pointers. In combination, enable Iden- 
tlflcalton of all TDVEs 86 which define data stored In various buffers 88 to be accessed (via pointers to BCBs 
90 contained In each pointed-to TDVE and intermediate TDVEs). Next, flags are Included which define various 
control functtons and intenrupt states. 

An OCB 80 further indudes a destlnatton address for the data and a logical Input port address at the des- 
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tination where the data Is to t>e directed. Under certain ctrcumstanoes, an OCB 80 nnay also include control 
message data to enable control Information to be transmitted to a destination address. 

Table 6 illustrates an OPTE 92 data structure which Is substantially similar to an 9PTE 60 but with reference 
to OCBs 80 that are chained to provide outgoing data. An OPTE 92 includes a pointer to a start of an OCB 

5 chain and a pointer to the end of the OCB chain. Flags are Included which define interrupt states and other 
control functions. An OPTE 92 also Includes a pointer to a next OPTE 92 so as to enable a chained series of 
OPTEs to be fed to the output via the ready queue. Pointers are also included to the input physical port and 
the input logical port and are used for functions unrelated to this Invention. 

The above description has considered control block data structures needed to accomplish data message 

10 transfers. As shown In Figs 3c and 3d, similar control blocks are used to enable dispatch and receipt of control 
messages. However, due the relative simplicity of control messages, the use of a TDV table (and its TDVEs) 
is unnecessary. As a result, in a control message source node (Fig 3c), an OCB 80 Includes a pointer to a 
BCB 100 that defines a first portion of a control message that Is stored in buffer 102. Additional BCBs 104, 
106, etc. may be chained irom BCB 100. Similarly, in a control message destination node (Fig. 3d}, an ICB 

IS 62 includes a pointer directly to a BCB 108. and Indirectly to chained BCBs 110 and 112. These control block 
structures enatrfe a control message to be assembled in a source node and to be received and stored In buffer 
in a destination node. 

MESSAGE ASSEMBLY 

20 

Assume that Node A (see Fig 1) Is to commence a transmission of a control message to Node E. At such 
time, microprocessor 22 In Node A commences assembly of control blocks which will enable the control mes- 
sage to be dispatched via Its output stem. As Indicated above, the Invention uses the concept of a transmission. 
All messages to be dispatched during a single transmission (either continuously or discontinuously) are se- 

25 quence dependent and are irwariably transmitted in the order of sequence dependency. Each OCB defines a 
message by pointing to one or more BCBs that further define portions of the message, all of which are se- 
quence depeTKJent. Transmissions, per se, have no ordered priority and may be dispatched In any sequence. 

A requirement to dispatch a control message causes generatton of an OPTE 92 (see Fig 3c) which includes 
a pointer to an OCB 80 that ,in turn, points to all BCBs that define the control message to be dispatched during 

30 a single transmission. The chained BCBs in turn define tHjffer store areas containing the actual control mes^ 
sage. For instance. In Rg. 3c, OCB 80 has BCBs 100, 104, 106 chained therefrom, oach potntirtg to a separate 
buffer memory area. 

Asa result of an earlier dispatched control message. Node E, in preparation torecehfe the control message 
from Node A, assigns a logical input port to receh/e the control rrtes&age, defines an IPTE data structure, an 
35 ICB data etructura and BCB etructures are defined. Those logtoal data struotures enable the control message 
to be stored In the defined buffer areas. Such structures are illustrated In Fig. 3d wherein IPTE 60, ICB 62 
and BCBs 108,110 and 112 will enable storage of the control message received during a transmission. 

Once all of the aforesaid data structures have been created and are present in control block registers within 
hardware output port 56 (Fig 3c) and hardwaro Input port 54 (Fig. 3d), the actual data processing action can 
40 teke place to accomplish the transmission between source Node A and destination Node E 

HARDWAReSOFTWARE CONTROL OF MESSAGE TRANSMISSION ORDERING 

Referring to Fig. 4, the procedure will be described for assuring control message ordering within a trans- 
45 mission, llnitiafly assume that output port table 30 In Node A contains OPTEs A-D which define, via chained 
OCBs, message data that is to be transmitted during a plurality of trensmrssions. For instance, OPTE Ainciudes 
a pointer to OCB 100, which Includes a pointer to OCB 102 and. Indirectly, to OCB 104, etc. OCB 100 includes 
a pointer to BCB 106 which in turn has BCBs 108 and 110 chained therefrom. BCBs 106. 108 and 110 include 
pointere to storage areas within control message buffere 112 which include control message data to be dis- 
50 patched via network Interface 37 to a destinatbn node. 

The control message data pointed to by OCB 100 must be dispatched to and acknowledged by the des- 
tination node before the control message data pointed to by OCB 102 can t>e dispatched and before the mes- 
sage data pointed to by OCB 104 can be dispatched. Note, however, that output port table 30 Includes further 
OPTE's which are independent and form a portion of other messages. As such, data pointed to by entries in 
66 OPTE's B,C,D may be transmitted in any order, so long as data pointed to by an OPTE (and any subsidiary 
OCBs) Is lnvariat>{y trartsmttted in order of dependency. 

Within memory Interface 24 Is hardware output port 56 which Includes a plurality of control tdock registers 
114, 116, 118, etc. Each control block register is loaded with control informatlonfrom the indicated control block 
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so as to enable operation of the system during esecution time. Hardware output port 56 further includ^ a pair 
of ready queue registers 120 and 122. Ready queue register 120 is res^^ed for control messages evidencing 
priority A and, in this instance. H is assumed that control messages chained from OPTE A and OPTE B are 
priority Aand are readyfor dispatch to a destination node or destination nodes. Ready queue register 122 con- 
5 tains pointers to OPTE C and OPTE D which are also ready for dispatch but evidence a lower priority, i.e., pn- 
ority B. Hardware output port 56 further handles control message dispatch and reception of acknowledgements 

from a destination node. . * ^ 

in operation, hardware output port 56 initially accesses the pointer to OPTE A from pnonty A ready queue 
register 120. The control triodt data comprising OPTE A. OCB 100. BCBs 106.108 and 110 are loaded into 
,0 control block registers 112- 118. etc.. respecth/ely. Under control crfentries in these registers, data from control 
message buffer 112 is accessed and transmitted via hardware output port 56. 

As above indicated, a transmission-level sequence number is assigned to OPTE A and is increased by 
one for every message portion sent out during the transmission for OPTE A. Assume that the data pointed to 
by OCB 100. and BCBs 106 and 108 are transmitted, but a disconnect occurs before the message portion In 
16 the buffer pointed to by BCB 110 can be transmitted. The disconnect procedure causes the destination node 
to acknowiedga receipt of the last successfully reoeh/ed data. Microprocessor 22 stores the status of the mes- 
sage and causes hardware output port 56 to go to the next entry In the priorfty A ready queue 120. As a result, 
a new trensmission is commenced for OPTE B. OPTE A remains listed in ready queue register 120 and will 
Q be later accessed when reached after hardware output port 56 has handled other priority A entnes In ready 

20 queue register 120. *^ ^r-^x^^M 

A similar search of the ready queues occurs If after an entire control message (as manlftet by OCB i Q2. 
for example) is transmitted, and acknowledged, but no path can be found to transmit the control message rep- 
resented by OCB 1 04. Note that a destination node acknowledges receipt of a message (as defined by an OCB) 
only when all buffera and BCBs chained from the OCB are received. Only when a disconnect occurs during a 

25 BCB transmission or between BCBs, does the destination node acknowledge the last BCB received. 

When OPTE A Is again reached In priority A ready queue register 120. the status is retrieved end the trans- 
mission is resumed. This procedure continues until all control data linked to OPTE A has been dispatched and 
has been acknowledged. In this manner, all control message portions within a transmlssten are assured of se- 
quential transmlssk>n. 

30 Prtarity B ready queue register 1 22 is reached If all entries In prtority A ready queue register 1 20 have beeai 

successfully handled or all work in priority A ready queue register has been tried, but for some reason blocked. 
Assume that all the priority Adestination nodes are busy. This results in all the Queue Atransmisstons falling 
to establish connection. If this happens, the queue management hardware In hardware output port 56 tnes a 
lower priority queue for work (i.e. priority B ready queue 1 22). If any priority B message makes It through to a 

90 destination node an complotos then the queue manager goes bad< to the top of priority queue A and tnes to 
schedule the work again. This repeats until all the worfc oompletea. 

As above lndlcat©d. each subsequent control message (I.e. OCB) Is dispatched by hardware output port 

0 56 when an acknowledgement is received from a destinatton node indicating successful message receipt at 
the destinatton node. Thus no subsequent message can be transmitted until a prevtous message is acknowl- 
40 edged. At the destination node, sequence numbere of messages portions receh^ed In a transmission are 
tracked so as to know which message portions are duplicates or not 

The above description has concerned the dispatch and receipt of control messages in a predetermined 
order An Identical procedure is implemented for data messages by buffer Interfece 35, however in this case, 
the control Wock structures include TDV tables and respective TDVE entries. Otherwise the procedures are 
45 identical and assure that portions of a data message defined by chained OCBs are dispatched in dependency 
order. 

It should be understood that the foregoing descrlptton Is only lllustrath/e of the Invention. Various alterna- 
tives and modifications can be devised by those sklfled In the art without departing from the invention. For 
instance while the invention has been described in the context of the dispatch of messages from a source 
50 node to on© destination node, the invention assures that multiple destination nodes will receive message por- 
tlons comprising a transmission in dependency order. 



Claims 

1. A node for transmitting sequence dependent messages in a required sequence in a computing system 
comprising a plurality of like nodes connected by a communicattons network comprising: 
communicatton interface means for exchanging messages with other nodes; 
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message queue means for arranging a plurality of message entries in a queue, each message entry 
comprising a pointer to an output message control structure; 

memory for storing a plurality ol output n^ssage control structures, each output message control 
structure indicating one or more chalnedfurther message control structures that define message portions 
which together comprise a message, each output message control structure further enabled to be chained 
to other message output control structures, a plurality of message control structures that are chained 
exhibiting a sequence dependency; and 

processor means controlled by message entries in said message queue means for dispatching at 
least first and second messages under control of first and second chained output message control struc- 
tures, respectivety, said processor means enabling dispatch of said second message only after successful 
transmtesion of said first message upon which said second message portion is sequence dependent 

A node as claimed In claim 1 wherein each message entry In said message queue means defines a mes- 
sage transmission, all message portions designated by said output message control structure that Is 
pointed to by a said message entry exhibiting sequence dependency and being dispatched under control 
of said processor means strictly in accordance with said sequence dependency. 

A node as claimed in either of claims 1 or 2 wherein said message queue means arranges said message 
entries in a plurality of queues of different priority orders, said processor means responsive to an unsuc- 
cessful dispatch of a first message to discontinue a transmission that includes further portions of said 
first message and to commence a transmission of a second message in accordance with a further mes- 
sage entry queued in a like priority message queue means. 

A node as claimed in any preceding claim, wherein said pmcessor means, after discontinuance of a sub- 
sequent transmission as a result of a successful or unsuccessful dispatch of a message, again attempts 
a transmission of a message comprising another transmission until all message portions included in said 
another transmission are successfully dispatched. 

A node as claimed in any preceding claim, wherein said processor means, upon receiving an Indication 
via said communication means, of an acknowledgement of successful receipt by another node of one mes- 
sage, enables dispatch of a succeeding message that is sequence dependent upon said one message. 

Anode as claimed in any preceding daim, wherein said processor means assigns each received message 
portion a sequential number, said processor operational, when said node receives message portions from 
another node, to check a sequence number included with said message portion against a highest se- 
quence number of message portions received previously for a transmission, and to discard any message 
portion whose sequence number equals or is less than said highest sequence number for said transmis- 
sion. 

A computing system comprising a plurality of nodes as claimed In any of claims 1 to 6 connected by a 
communication network. 

A method for transmitting sequence dependent messages in a required sequence in a computing system 
comprising a plurality of nodes connected by a communications network, the method comprising the steps 
of: 

a. arranging a plurality of message entries in a queue, each message entry comprising a pointer to an 
output message control structure; 

b. storing a plurality of output message control structures, each output message control structure in- 
dicating one or more chained further message control structures that enable access to data comprising 
a message or a p(»rtion of a message, a plurality of sakJ output message control structures that are 
chained and a plurality of said further message control structures that are chained from an output mes- 
sage control structure all exhibiting a sequence dependency; and 

c. dispatching, under control of message entries In said message queue means, at least first and sec- 
ond messages designated by chained output message control structures, and enabling dispatch of said 
second message only after successful transmissbn of said first message upon which said second mes- 
sage is sequence dependent 

A method as dalmed In claim 8, wherein each message entry in said message queue means defines a 
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A method as claimed In any of claims 8. 9. 10, 11 or 12. f urtt.er cornp^ing the steps of: 
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